Unlocking the Secret of Super-Speed Reading!
Inside every super-smart computer, there's a team working hard to write stories for you.
The Target Model. He is super smart and never makes a mistake. But... he is very slow. He writes one... word... at... a... time.
The Draft Model. She is small, full of energy, and loves to guess! She's super fast, but sometimes her guesses are a little silly.
Normally, computers write like Professor Prose. To write a sentence, the Professor has to think really hard about the first word, write it down, then think about the next one, and so on.
It's perfect, but it takes FOREVER!
What if Zippy helps? While the Professor is busy checking the first word, Zippy guesses the next 3 words!
Then, the Professor just looks at Zippy's list and says "Yes, Yes, Yes!" or "No, try again." Checking a list is much faster than writing from scratch!
This is called Speculative Decoding!
See the difference between "Normal" and "Speculative" speed!
Professor Prose works alone. Accurate but slow.
Zippy guesses, Professor checks! Much faster.
Professor Prose picks up one block (word), checks it, puts it down. Then walks back to get the next one.
Zippy hands the Professor a whole tray of 3 blocks! The Professor checks them all at once. One trip instead of three!